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The management of customer services by telephone encounters several problems: an 
uncontrollable flow of calls, complicated resource management, a very high cost of 
service, and more. Opportunities to improve the quality of service, save time and 
money triggered the widespread implementation of artificial intelligence (AI) based 
callbot. This article outlines the straightforward workflow developed to model the 
architecture of the callbot. Therefore, several algorithms were evaluated and compared 
based on real knowledge of a call center of an insurance society. The algorithms 
considered are: k-nearest neighbours (KNN), support vector machine (SVM), random 
forests (RF), logistic regression (LR), and Naive Bayes (NB). The comparison criteria 
are: correct responses, response time, accuracy, Cohen’s kappa and F1 score using 
n-gram (1.1) and (2.2). The results obtained show that the SVM (accuracy=70.29%) 
presents the best results on all the comparison criteria. The comparison between the 
results of the human agents and the callbot shows an improvement in several levels: 
the cost savings are greater than 80% on all the tests carried out, the holding time 
decrease to 0 seconds, and the processing time (almost a third or more). The results 
obtained sufficiently meet the objectives of this project. 


This is an open access article under the CC BY-SA license. 


Department of Math and computer, Faculty of Sciences Ain Chock, Hassan II Univeristy 


Casablanca, Morocco 


Email: imad.aatt@ gmail.com 


1. INTRODUCTION 


Many industrial and service companies nowadays offer their customers a remote service by telephone. 
Thus, the customer service companies are more and more present to meet the needs of the customers and with 
it was born a new concept of call center agent: the callbot. This concept is part of the technology family used to 
streamline communication with customers, such as voice agents, phonebots, conversational agents, and more. 
On other hand, several difficulties are identified in the management of customer services, among which we 
find: a large number of customer calls which leads to an enormous cost for the treatment, and a big difficulty 
to provide a 24/7 service with good quality. The callbot is a less well-known solution compared to another 
variant of the automatic system (chatbots [1]-[10]). This technology is an artificial intelligence (AI) that can 
manage a dialogue with a customer during a telephone call, to meet his need, and to solve it autonomously, 
24 hours a day, without waiting time. This article outlines the straightforward workflow developed to model 
the callbot architecture. In this context, the machine has the role of understanding the dialogue and interacting 
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with the client. This approach requires several designs and implementation axes, namely understanding human 
language and responding in the same way. The rest of this article is organized in the following order: section 
2 presents reviews of related works. Section 3 contains the research methodology and details the proposed 
architecture. The result and discussion are presented in section 4. Section 5 contains a conclusion. 


2. RELATED WORKS 


This paper is a continuation of our previous works and concerning the realization of our 
Al-based callbot project. In the literature, most of the research papers deal with the chatbot problem. The 
researchers are more and more interested in callbots in previous years. The work [1] listed 10 machine learning 
algorithms (MLA) for training chat-bots, disclosed the information technology (IT) architecture of a chatbot 
platform using the natural language understanding (NLU) engine learning Bayesian method, and the results 
of testing this chatbot by users show a good satisfaction (75% of users are convenient with chatbot conversa- 
tion). The article [2] gives a formal process description for realizing a chatbot, and realized a prototype tool to 
transform a process model in business process model and notation (BPMN) into a chatbot, defined in artificial 
intelligence marking language (AIML). The study conducted in the paper [3] made a comparison between two 
models based on sequence-to-sequence and AIML for building chatbots. The result obtained concludes that 
the sequence-to-sequence model had a better information retrieval rate while the AIML chatbot ensured better 
task completion rate and user satisfaction. Augello et al. [4], created a social chat bot model based on the 
social-AIML (SAIML), the proposed architecture allowed a more exact interpretation of user sentences thanks 
to the highlighting of social practice in the deliberative process of an agent. In order to improve the overall 
recommendation process, the authors of the article [5] have simplified the human-chatbot conversation with the 
conversational parameters supported by default. The proposed approach aims to avoid inconsistencies during 
the interactions with the chatbot. The project presented in the work [6] deals with an investigation about the 
use of chatbots to provide negotiation facilities and the incorporation of chatbots into an open learner model- 
ing environment. Shalaby ef al. [7], describes the main steps for developing a conversational virtual agent to 
understand and respond to complaints related to vehicle equipment. The results obtained show a precision that 
adapts better with a large volume of features up to 30% more accurate and is better at understanding user utter- 
ances with domain-specific entities. Cerezo et al. presents the implementation of a chatbot developed for 
the Pharo software ecosystem. The chatbot includes several components: the discord application programming 
interface (API), term frequency (TF), and inverse document frequency (IDF) algorithms to perform sentence 
classification and key-concept collection respectively, and an expert recommendation system. The system was 
tested by the Pharo Community but the conversational behavior of the chatbot was not able to follow users’ 
expectations. The work [9] develops an architecture for the Messenger chatbot using Amazon Web Services, 
the architecture has proven to be extensible and scalable. 


One of the important phases in developing the architecture of our callbot is to define the decision al- 
gorithm module. In the following, we presented a comparative study of MLA. The MLA find their applications 
in several areas, namely: text classification [13]-[17], medical diagnosis [18], pollution prediction [19], spam 
email detection [20], plant disease identification [21], and stock daily trading [22]. For example, The paper 
describes the use of the KNN algorithm with the TF-IDF method for text classification. The results ob- 
tained show that this combination proved to be a good choice with changes in their implementation. The work 
[14], presents a study of a BBC news text classification system. The algorithms covered are k-nearest neighbor 
(KNN) , random forest (RF), and logistic regression (LR). In this experimental the TF-IDF vectorizer feature 
and LR classifier attains the highest accuracy of 97% for the data set. The RF classifier gave an accuracy of 
93%. With an overall accuracy of 92% The KNN was the algorithm with the least accuracy. In terms of all pa- 
rameters The LR classifier gave a performance as expected. The study shows that the effectiveness of the 
classifiers based on different training text corpuses is distinct and deduce that classifier performance is relevant 
to its training corpus in some degree, and good or high-quality training corpuses may derive classifiers of good 
performance. The authors of the paper compare seven ML algorithms: LR, KNN, support vector machine 
(SVM), Naive Bayes (NB), decision tree (DT), RF, AdaBoost (AB) on the Pima Indian Diabetes (PID) dataset 
to predict diabetes. They found that the model with LR and SVM works well on diabetes prediction. The 
authors of the study tested 12 MLA to predict costs and carbon dioxide emission in an integrated energy- 
water optimization model and considered four indices to examine the prediction accuracy of the algorithms. 
Meanwhile, the light gradient boosting machine and extra tree algorithms enjoyed higher prediction accuracy in 
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this research than other algorithms. To detect spam emails, the work proposes a hybrid bagging approach 
that implements NB and J48 DT MLA. The proposed approach achieved 88.12% of overall accuracy with the 
hybrid bagged approach implementation. The contribution of the paper is to make a comparison between 
the trading performance of the deep neural network (DNN) algorithms and traditional MLA in the Chinese 
stock market and the US stock market. The experimental results in S&P 500 index component stocks (SPICS) 
and 185 CSI 300 index component stocks (CSICS) show that some traditional MLA have better performance 
than DNN algorithms in most directional evaluation indicators. The paper gives a systematic review of 
MLA in recommender systems which can help application developers to deal with the algorithms, their types, 
and trends in the use of specific algorithms. This work also details classes of evaluation metrics and ranks the 
MLA based on these metrics. 


3. RESEARCH METHODOLOGY 
3.1. The proposed architecture 

In this work, we discuss the proposed architecture of the callbot system. Figure[1|shows the compo- 
nents of the system. The architecture includes several components: the private branch exchange (PBX) server 
[24], the automatic speech recognition (ASR) module, the natural language processing (NLP) module, the 
decision-making module, and the text to speech (TTS) module. 


Glee 


Téléphone 


Téléphone Vol 


Figure 1. The proposed architecture of the AI-based enterprise callbot 


The PBX server, is a private telephone network capable of handling communication between users 
of the telephone system within the same network or with external users, using technologies like voice over 
internet protocol (VoIP). The callbot process is initiated by the customer’s call, this call can be made from 
a traditional telephone line (landline or mobile) or VoIP. The PBX server switches the call to the company’s 
internal telephone system. The ASR handles the transcription of the client’s request from their voice to text 
form. Then this text will be processed by the NLP module to transfer it to the decision-making part. Based on 
the knowledge database, the “decision-making module” generates the best(s) answer(s) before transmitting this 
(these) response(s) to the client. The TTS module transforms the response from the textual form into a human 
voice. The PBX still acts as a switch to transfer the answer to the customer. 

ASR is the technique which allows a program to transform human speech from its vocal format to a 
text format in order to use it by the machine. it is also called the speech to text. Figure [2] shows how ASR 
works: the module takes the audio signal as input, the “signal processing and feature extraction” part allows 
signal cleaning and noise suppression to improve speech quality, the signal is then converted from the time 
domain to the frequency domain. the “Hypothesis Search” module combines the result of the two modules 
“The acoustic model” and “The language model” in order to output the sequence of words with the best result. 
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Feature AM/Score 


Figure 2. The architecture of ASR system 
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3.2. NLP module 


Conversational AI, such as callbots, is one of the most common contexts in which NLP [26]-[29] 
takes place. The purpose of this technique is to allow a fluid and natural conversation between the machine and 
the human, the bot integrates into a natural conversation as human as possible while moving away from static 
pre-senarized responses. Our callbot is designed to communicate with customers and give them the impression 
that they are talking to a person and not a bot. But when customers are in a conversation, they may make 
unnecessary sounds or may be different ways of asking the same question. Therefore, we need to preprocess 
the data so that its engine can easily understand it. Hence, the main task of the callbot is to be able to understand 
the customer’s needs, called the intent. Suppose a client wishes to report a claim, the customer can express his 
need in several possible ways. We must identify the intent, the context and take into account everything that is 
discussed during the call because the client wants to get the right answer. For example, the customer may say 
“T want to report a water leak in my bathroom” or “I have a water damage problem’. The callbot must conclude 
that the customer wishes to report a claim relating to his home insurance contract. 


Figure[3]shows the process of the text normalization phases from the input speech to the output valid 
answer. The first step is the sentence segmentation, which will divide the input text into separate sentences, then 
the clear special characters step eliminates special characters from the text. Tokenization is the phase which 
consists in distilling the text into single words. These ’tokens” allow the system to first identify the basic words 
involved in the text prior to further processing the material. Stop words are words that do not have important 
meanings for use in search queries. Typically, these words are excluded from search requests because they 
return a lot of unnecessary information. The stemming step aims to make the words in the original form in the 
French language avoid some problem of expressing a single word in different forms. The Lemmatization stage 
is the algorithmic process of determining the lemma (a canonical form) of a word depending on its expected 


meaning. 


Figure 3. Flow chart of the text normalization process 
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3.3. Callbot user interface 

The callbot’s user interface (UI) allows users to develop the scenarios that the machine will use in 
order to meet the customer’s request. it is the interface that allows to implement the knowledge base graphically 
without any prior technical knowledge. In Figure 4], we have the configuration of the scenario responding to 
the customer request “receive a home insurance certificate”. In Figure [5] we have an real client request. In 
addition to scenarios configuration shown in Figure|4| the interface allows: conversation monitoring, customer 
informations (phone number, name, contract number) as well as call recording so that the supervisor follows 
the operation of the bot from end to end shown in Figure[5] 


OO® 


Recouvrement Négatif 
Uniquement 


Prise de congé 


TS) 
(©) 


@ 


Figure 4. Example of a simple scenario 


MV, seeeeeieee eeeeeies ae > 0:00 


2021-10-25 16:40:32 


Bienvenue au Service Client ***** ASSURANCE. Je suis votre assistante 


Callbot 2021-10-25 16:40:37 


Etes-vous bien M. 


2021-10-25 16:40:48 _M. *********#* s#4e84s 


oui 


2021-10-25 16:41:05 M, sees eee 


je voudrais mon attestation d'assurance pour l'avenir l'année 4 venir assurance habitation 


Callbot 2021-10-25 16:41:05 


Souhaitez-vous savoir Comment obtenir une attestation d'assurance ? 


2021-10-25 16:41:14 M, *****teeete teeteeee 


oui 


( } Callbot 2021-10-25 16:41:14 


C'est trés simple, je vous envoie immédiatement votre attestation par mail. Vous pouvez également la retrouver sur votre espace personnel ou notre application mobile en cliquant sur le lien envoyé a l'instant par mail et sms 


C 1 Callbot 2021-10-25 16:41:30 


***** ASSURANCE vous remercie pour votre appel, Au revoir 


Figure 5. Illustration of conversation between the customer and the callbot 


3.4. Decision-making module 

This module’s role is to find the answer to the customer’s request. It includes the callbot-UI (CUD), 
the knowledge base, and the MLA. The CUI 1s a graphical tool developed to allow “business” users to manage 
the callbot system. Its main role is to create, modify or delete a scenario to feed the knowledge base of our 
system. This tool also makes it possible to monitor the system operations in real-time. The CUI assists users 
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to create simple scenario capable of giving a direct answer, or complicated scenarios with several interactions. 
Moreover, the algorithms cited in Table [I] are the most used algorithms in ML, hence we wanted to make a 
comparison between all these algorithms in order to find the best one for our study. The comparison criteria 
considered are: the average of correct answers, the response time, the accuracy, Cohen’s kappa, and F1 score. 


Table 1. List of ML algorithms 


Algorithm Description Algorithm author 
SVM The goal is to find the hyper-planes that separate the data points with the Cortes and Vapnik 


maximum margins between the two decision limits.In order to minimize 
generalizing errors, SVMs have the advantage of reducing the risk of 
exceedances. 
RF The Meta-estimator adapts to a number of DT classifiers on different Breimen 
sub-samples of the data set and uses the average to improve predictive 
accuracy and overflow control. 


KNN The KNN is a simple MLA. The purpose of the algorithm is to catego- Altman 
rize objects into one of the pre-defined classes of a sample group created 
by ML. 

LR LR is a standard probabilistic statistical classification model that has Tolles and Meurer[33] 


been used widely in disciplines such as computer vision, marketing, 
social sciences, to name a few. 

Naive Bayesian NB isa very convenient way to learn Bayesian.It assumes that the char- Thomas Bayes (1702-61) 
acteristic values are conditionally independent taking into account the 
target value, and consequently significantly reduces the calculation cost. 

DT The DT is derived from a set of labelled learning instances represented Belson 
by an array of attribute values and a class label. 


4. RESULTS AND DISCUSSION 


4.1. Algorithms comparison 


This section discusses two parts, the first is the comparison between the MLA in order to select the 
suitable one to insert into the decision-making module. The second part presents the main statistics on the use 
of the proposed strategy in the call service system. In order to compare the results of the algorithms used on 
this article, we split our data source into two parts: a training part representing 80% of the instances used, and 
a fifth of the instances is used for model validation. The results are presented in the form of tables and figures 
in the same part. 


4.1.1. Comparison using n-gram (2,2) 


From the Table [2| and the Figure|6| we deduced that KNNs, Naive Bayes, SVM and RF (50) offered 
an excellent response time, which does not exceed the maximum 20 ms. the rest of the algorithms take an 
average time varying between 30 and 60 ms. For the percentage of correct responses, the results are fair except 
for the KNN algorithms where the results are very poor. SVM give us the best results (precision=65.87% and 
accuracy=66. 13%). 


Table 2. Performance of algorithms using n-gram (2,2) 
List of algorithms Average correct answer (%) Response time(s) Accuracy (%) 


SVM (svc) 65.87 0.1428 66.13 
RF (200) 60.12 0.6014 56.12 
RF (100) 60.18 0.3230 56.14 
RF (50) 61.02 0.1842 58.12 
DT 59.34 0.0909 59.30 
KNN (n=5) 10.50 0.0383 16.01 
KNN (n=3) 3349 0.0382 S132 
LR 62.18 0.3194 65.14 
NB 63.32 0.0387 65.88 
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Figure 6. Performance of algorithms using n-gram (2,2) 


4.1.2. Comparison using n-gram (1,1) 

The results of the Table|3]and the Figure[7| show that all algorithms except RF (200 and 100) present an 
excellent response time, which does not exceed the maximum 25 ms. For the percentage of correct responses, 
there is an improvement in the overall results, but these results are still fair. Decision forests (200)/(100) and 
SVM give us the best results (64.18%, 61.94% and 61.94%). 

In order to push the comparison further, two other comparison criteria are applied: Cohen’s kappa 
and F1 score. As presented in Table |4] and Figure [8| for n-gram (1,1) and from the results of Cohen’s kappa 
comparison: the strenght of agreement of the KNN (n=5) is slight (ks between 0 and 0.20), the strenght of 
agreement of the KNN (n=3) is poor (ks between 0.21 and 0.40), the strenght of agreement of the rest of 
algothims is substantial (ks between 0.61 and 0.80). For the results of Cohen’s kappa for n-gram (2,2): the 
strenght of agreement of the KNN (n=5) is slight (ks between 0 and 0.20), the strenght of agreement of the 
KNN (n=3) is poor (ks between 0.21 and 0.40), the strenght of agreement of RF and DT is moderate (ks 
between 0.41 and 0.60), the strenght of agreement of LR, NB and SVM is substantial (ks between 0.61 and 
0.80). 

Based on F1 score the results of the SVM aglorithm are the best (0.62 for bigram and 0.60 from ngram 
(1,1)), NB and LR comes in a second ranking (F1 score between 0.56 and 0.60). KNN presents weak results 
(Fl score max 0.32), and the rest of the algorithms presents middling results. Using the n-gram (1,1), results 
of accuracy and Cohen’s kappa are better, using the n-gram (2,2) average correct answer, F1 score are better. 
From all the results described in this section, the SVM algorithm presents the best results on all the comparison 
criteria. 


Table 3. Performance of algorithms using n-gram (1,1) 
List of algorithms Average correct answer (%) Response time (s) | Accuracy (%) 


SVM (svc) 61.84 0.1438 70.29 
RF (200) 53.24 0.6210 63.24 
RF (100) 58.00 0.3254 67.65 
RF (50) 53.83 0.1864 64.37 
DT 54.63 0.0545 58.11 
KNN (n=5) 7.17 0.0311 14.06 
KNN (n=3) 15.29 0.0378 25.08 
LR 58.92 0.2273 69.45 
NB 58.89 0.2059 68.41 
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Figure 7. Performance of algorithms using n-gram (1,1) 


List of 
algorithms 
SVM (svc) 


So 3S =) ro) 


a 
Z 


on 


a 


5 
iw 


A QW 


tRandom Forest De 
(50) 


S 


m F 


(100) 


>™RHRMWWWWWW 0? ?Ee»W 


(200) 


AL MW EeE= FQ 


SVM (svc) Random ForestRand 


Algorithms 


Figure 8. Presentation of results of Fl Score and Cohen’s kappa 
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4.2. Main statics of using the callbot strategy 

To test the changes noticed after the use of the callbot system in the treatment of customer calls, we 
first carried out statics in a period of 7 months of the year 2020 and 2021, linked to the following variables: 
the number of calls handled, the cost of calls, and the holding time. The statistics are collected in 4 different 
productions for 2 hours each. The reduced installation time is linked to the fact of not wanting to impact the 
quality of the service. The callbot is configured to handle 50 questions only. The rest of the requests are 
transferred to the agents. 


4.2.1. hold time 

Table |5} presents the customer waiting times before the implementation of the callbot, Table [6] and 
Figure [9] show the average duration that customers wait before contacting an agent. The waiting time varies 
from 90 seconds to more than 300 seconds. From these values, we can see that the waiting time affects the 
quality of the service. 


Table 5. Customer waiting time before contacting an agent (without callbot) 
Month hold time (seconds) 


09/2020 322 
10/2020 90 

11/2020 109 
12/2020 145 
01/2021 111 
02/2021 134 
03/2021 100 


Table 6. Customer waiting time before processing calls when testing the callbot 
Callbot Agents 


Test N 1 Os 156s 
Test N 2 Os 173 s 
Test N 3 Os 40s 
Test N 4 Os 108 s 


200 


180 


b 
ms 
Oo 


b 
N 
Oo 


customer waiting time (s) 
ee 
[es] oO 
oO oO 


a 
Oo 


40 


20 


Test N° 1 Test N° 2 Test N° 3 Test N° 4 


Tests 


Callbot —_@—=sAgent 


Figure 9. Customer waiting time before processing calls when testing the callbot 


From Table|6]and Figure|9| we deduced clearly that the waiting time in calls handled by the callbot is 
very short (near 0 seconds), that is immediate processing of the customer request. Whereas the calls handled 
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over the same period by the agents are always treated after a significant waiting time. We can also remarque 
that the implementation of the callbot does not necessarily improve the waiting time on calls outside the callbot 
perimeter. 


4.2.2. Cost time and duration of the treatment 


The processing time is the duration of the call between the advisor and the client. Prompt handling 
of the call will improve customer satisfaction. Table[7] present the changing in number of calls/costs. Table|8] 
compare the benifit between the cost of the calls handling by agents and calls handling by callbot. Figure[10} 
present the cost improvement between the callbot and calls handled by agents. And finally, Table |9|and Figure 
[1 1|present the evolution of the duration of calls handled by the callbot against the calls handled by the agents 


Table 7. Example of changes in the number of calls/costs over a period of 7 months 


Month Number of calls Cost 
09/2020 35,012 140,048 
10/2020 37,046 148,184 
11/2020 35,430 141,720 
12/2020 35,899 143,596 
01/2021 40,873 163,492 
02/2021 38,174 152,696 
03/2021 39,149 156,596 
Average 37,369 149,476 


Table 8. Cost of calls when set up and the benefit compared to the agent call handling model 
calls Costcallbot Costagents Saving 


Test N 1 52 23.93 208.00 88% 
Test N 2 54 34.12 216.00 84% 
Test N 3 53 33.14 212.00 84% 
Test N 4 79 28.69 316.00 91% 


300 


250 


ge 
od 86% 
fe) 
ro) 


% of saving cost 


100 


Test N°1 Test N° 2 Test N°3 Test N°4 
Tests 


"# # Cost callbot CostAgents =@==Saving 


Figure 10. Presentation of cost improvement between the callbot and calls handled by agents 
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Table 9. Duration of calls handled by callbot vs calls handled by agents 
Callbot Agents 


Test N 1 100 261 
Test N 2 80 280 
Test N 3 92 259 
Test N 4 112 275 


300 


customer waiting time (s) 
a 
Oo 


Test N°1 Test N° 2 Test N°3 Test N°4 
Tests 


Callbot —_@m=eAgent 


Figure 11. Duration of calls handled by callbot vs calls handled by agents 


Table[7| Table[8]and Figure[10| confirmed that the cost of calls handled by the callbot is much lower 
than the cost of calls handled by the agents. The cost savings are greater than 80% on all the tests carried out. 
From Table |9|and Figure[1 1] we find that the processing time varies between | min 20 seconds and 2 minutes, 
while the processing time by human agent varies on average between 4 minutes 20 seconds and 4 minutes 40 
seconds. The gain in processing time is very important for calls handled by the callbot, where better customer 
satisfaction can be expected. 


5. CONCLUSION 

This paper presented the architecture of our callbot system realized with NLP and ML techniques. 
The results obtained during this work showed that the SVM (accuracy=88.13%) and RF decision (accu- 
racy=96.61%) algorithms are the best for implementing our decision module. The tests carried out using 
the proposed approach have led to very significant gains: a reduction in the cost of processing calls (the cost 
savings are greater than 80% on all the tests carried out), and an optimization of both the holding time and the 
call processing duration. Future work could include the model optimization of the knowledge base in order to 
increase the number of calls processed by the proposed system. 
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